{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Fairness Exercise 2: Remediate Bias",
      "provenance": [],
      "collapsed_sections": [
        "dV10ovJif-A3",
        "Amp8qxOA7d5R",
        "KJob8ZofVnUC",
        "I-SJbapBEXPe"
      ]
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HnU0fNSuG2aD"
      },
      "source": [
        "# Fairness Exercise 2: Remediate Bias"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "iVkPBosnIFlu"
      },
      "source": [
        "**Learning Objectives:**\n",
        "* Remediate subgroup bias in the toxic text classifier by upweighting negative examples.\n",
        "* Re-evaluate the revised model to confirm successful remediation using Fairness Indicators and the What-If tool\n",
        "\n",
        "**Prerequisites**\n",
        "\n",
        "This exercise builds on [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab). It is strongly recommended that you complete **Fairness Exercise 1** prior to working through this exercise.\n",
        "\n",
        "## Overview\n",
        "\n",
        "In [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), you trained a toxicity classifier on the Civil Comments dataset and used Fairness Indicators to identify some unintended bias issues related to gender. In this exercise, you'll apply remediation techniques and retrain the model to mitigate this bias. You'll then use Fairness Indicators and the What-If tool to evaluate the results and confirm that the remediation efforts were successful."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AjxPPFCaEy__",
        "colab_type": "text"
      },
      "source": [
        "## Setup\n",
        "\n",
        "First, run the cell below to install Fairness Indicators. \n",
        "\n",
        "**NOTE:** You **MUST RESTART** the Colab runtime after doing this installation, either by clicking the **RESTART RUNTIME** button at the bottom of this cell or by selecting **Runtime->Restart runtime...** from the menu bar above."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YgvGsw5_E_X5",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!pip install fairness-indicators \\\n",
        "  \"absl-py==0.8.0\" \\\n",
        "  \"pyarrow==0.15.1\" \\\n",
        "  \"apache-beam==2.17.0\" \\\n",
        "  \"avro-python3==1.9.1\" \\\n",
        "  \"tfx-bsl==0.21.4\" \\\n",
        "  \"tensorflow-data-validation==0.21.5\""
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u1TGRRQWFL8J",
        "colab_type": "text"
      },
      "source": [
        "Next, import all the dependencies we'll use in this exercise, which include Fairness Indicators, TensorFlow Model Analysis (tfma), and the What-If tool (WIT):"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jJpmSHngFaGo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%tensorflow_version 2.x\n",
        "import os\n",
        "import tempfile\n",
        "import apache_beam as beam\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "from datetime import datetime\n",
        "\n",
        "import tensorflow_hub as hub\n",
        "import tensorflow as tf\n",
        "import tensorflow_model_analysis as tfma\n",
        "from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators\n",
        "from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
        "\n",
        "from witwidget.notebook.visualization import WitConfigBuilder\n",
        "from witwidget.notebook.visualization import WitWidget"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iJFtnjMiFbVc",
        "colab_type": "text"
      },
      "source": [
        "Run the following code to download and import the training and validation datasets. By default, the following code will load the preprocessed data (see [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab) for more details). If you prefer, you can enable the `download_original_data` checkbox at right to download the original dataset and preprocess it as described in the previous section (this may take 5-10 minutes)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dOTdyXkKF7KS",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "download_original_data = False #@param {type:\"boolean\"}\n",
        "\n",
        "if download_original_data:\n",
        "  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',\n",
        "                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')\n",
        "  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',\n",
        "                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')\n",
        "\n",
        "  # The identity terms list will be grouped together by their categories\n",
        "  # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,\n",
        "  # text column and label column will be kept after processing.\n",
        "  train_tf_file = util.convert_comments_data(train_tf_file)\n",
        "  validate_tf_file = util.convert_comments_data(validate_tf_file)\n",
        "\n",
        "else:\n",
        "  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',\n",
        "                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')\n",
        "  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',\n",
        "                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T8Kj_0KuHVpK",
        "colab_type": "text"
      },
      "source": [
        "Next, train the original model from [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), which we'll use as the baseline model for this exercise:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "sLQToWXgLHlh",
        "colab_type": "code",
        "cellView": "form",
        "colab": {}
      },
      "source": [
        "#@title Run this cell to train the baseline model from Exercise 1\n",
        "TEXT_FEATURE = 'comment_text'\n",
        "LABEL = 'toxicity'\n",
        "\n",
        "FEATURE_MAP = {\n",
        "    # Label:\n",
        "    LABEL: tf.io.FixedLenFeature([], tf.float32),\n",
        "    # Text:\n",
        "    TEXT_FEATURE:  tf.io.FixedLenFeature([], tf.string),\n",
        "\n",
        "    # Identities:\n",
        "    'sexual_orientation':tf.io.VarLenFeature(tf.string),\n",
        "    'gender':tf.io.VarLenFeature(tf.string),\n",
        "    'religion':tf.io.VarLenFeature(tf.string),\n",
        "    'race':tf.io.VarLenFeature(tf.string),\n",
        "    'disability':tf.io.VarLenFeature(tf.string),\n",
        "}\n",
        "\n",
        "def train_input_fn():\n",
        "  def parse_function(serialized):\n",
        "    parsed_example = tf.io.parse_single_example(\n",
        "        serialized=serialized, features=FEATURE_MAP)\n",
        "    # Adds a weight column to deal with unbalanced classes.\n",
        "    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)\n",
        "    return (parsed_example,\n",
        "            parsed_example[LABEL])\n",
        "  train_dataset = tf.data.TFRecordDataset(\n",
        "      filenames=[train_tf_file]).map(parse_function).batch(512)\n",
        "  return train_dataset\n",
        "\n",
        "BASE_DIR = tempfile.gettempdir()\n",
        "\n",
        "model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(\n",
        "    \"%Y%m%d-%H%M%S\"))\n",
        "\n",
        "embedded_text_feature_column = hub.text_embedding_column(\n",
        "    key=TEXT_FEATURE,\n",
        "    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')\n",
        "\n",
        "classifier = tf.estimator.DNNClassifier(\n",
        "    hidden_units=[500, 100],\n",
        "    weight_column='weight',\n",
        "    feature_columns=[embedded_text_feature_column],\n",
        "    optimizer=tf.optimizers.Adagrad(learning_rate=0.003),\n",
        "    loss_reduction=tf.losses.Reduction.SUM,\n",
        "    n_classes=2,\n",
        "    model_dir=model_dir)\n",
        "\n",
        "classifier.train(input_fn=train_input_fn, steps=1000)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "t4kfNFo-Ig7w",
        "colab_type": "text"
      },
      "source": [
        "In the next section, we'll apply bias-remediation techniques on our data and then train a revised model on the updated data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j1NHbpiMGJzX",
        "colab_type": "text"
      },
      "source": [
        "## Remediate Bias\n",
        "\n",
        "To remediate bias in our model, we'll first need to define the remediation metrics we'll use to gauge success and choose an appropriate remediation technique. Then we'll retrain the model using the technique we've selected.\n",
        "\n",
        "### Define the remediation metrics\n",
        "\n",
        "Before we can apply bias-remediation techniques to our model, we first need to define what successful remediation looks like in the context of our particular problem. As we saw in [**Fairness Exercise 1: Explore the Model**](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), there are often tradeoffs that come into play when optimizing a model (for example, adjustments that decrease false positives may increase false negatives), so we need to choose the evaluation metrics that best align with our priorities.\n",
        "\n",
        "For our toxicity classifier, we've identified that our primary concern is ensuring that gender-related comments are not disproportionately misclassified as toxic, which could result in constructive discourse being suppressed. So here, we will define successful remediation as a **decrease in the FPR (false-positive rate) for gender subgroups relative to the overall FPR**.\n",
        "\n",
        "### Choose a remediation technique\n",
        "\n",
        "To mitigate false-positive rate for gender subgroups, we want to help the model \"unlearn\" any false correlations it's learned between gender-related terminology and toxicity. We've determined that this false correlation likely stems from an insufficient number of training examples in which gender terminology was used in nontoxic contexts. \n",
        "\n",
        "One excellent way to remediate this issue would be to add more nontoxic examples to each gender subgroup to balance out the dataset, and then retrain on the amended data. However, we've already trained on all the data we have, so what can we do? This is a common problem ML engineers face. Collecting additional data can be costly, resource-intensive, and time-consuming, and as a result, it may just not be feasible in certain circumstances.\n",
        "\n",
        "One alternative solution is to simulate additional data by *upweighting* the existing examples in the disproportionately underrepresented group (increasing the loss penalty for errors for these examples) so they carry more weight and are not as easily overwhelmed by the rest of the data.\n",
        "\n",
        "Let's update the input fuction of our model to implement upweighting for nontoxic examples belonging to one or more gender subgroups. In the `UPDATES FOR UPWEIGHTING` section of the code below, we've increased the `weight` values for nontoxic examples that contain a `gender` value of `transgender`, `female`, or `male`:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SpCcmRPpGOmf",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def train_input_fn_with_remediation():\n",
        "  def parse_function(serialized):\n",
        "    parsed_example = tf.io.parse_single_example(\n",
        "        serialized=serialized, features=FEATURE_MAP)\n",
        "    # Adds a weight column to deal with unbalanced classes.\n",
        "  \n",
        "    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)\n",
        "  \n",
        "    # BEGIN UPDATES FOR UPWEIGHTING\n",
        "    # Up-weighting non-toxic examples to balance toxic and non-toxic examples\n",
        "    # for gender slice.\n",
        "    #\n",
        "    values = parsed_example['gender'].values\n",
        "    # 'toxicity' label zero represents the example is non-toxic.\n",
        "    if tf.equal(parsed_example[LABEL], 0):\n",
        "      # We tuned the upweighting hyperparameters, and found we got good \n",
        "      # results by setting `weight`s of 0.4 for `transgender`, \n",
        "      # 0.5 for `female`, and 0.7 for `male`.\n",
        "      # NOTE: `other_gender` is not upweighted separately, because all examples\n",
        "      # tagged with `other_gender` were also tagged with one of the other\n",
        "      # values below\n",
        "      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'transgender')), 0):\n",
        "        parsed_example['weight'] = tf.constant(0.4)\n",
        "      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'female')), 0):\n",
        "        parsed_example['weight'] = tf.constant(0.5)\n",
        "      if tf.greater(tf.math.count_nonzero(tf.equal(values, 'male')), 0):\n",
        "        parsed_example['weight'] = tf.constant(0.7)\n",
        "        \n",
        "    return (parsed_example,\n",
        "            parsed_example[LABEL])\n",
        "  # END UPDATES FOR UPWEIGHTING\n",
        "\n",
        "  train_dataset = tf.data.TFRecordDataset(\n",
        "      filenames=[train_tf_file]).map(parse_function).batch(512)\n",
        "  return train_dataset"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yJPiLTl_owWq",
        "colab_type": "text"
      },
      "source": [
        "### Retrain the model\n",
        "\n",
        "Now, let's retrain the model with our upweighted examples:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WMxcTsjfa4s_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "BASE_DIR = tempfile.gettempdir()\n",
        "  \n",
        "model_dir_with_remediation = os.path.join(BASE_DIR, 'train', datetime.now().strftime(\n",
        "    \"%Y%m%d-%H%M%S\"))\n",
        "\n",
        "embedded_text_feature_column = hub.text_embedding_column(\n",
        "    key=TEXT_FEATURE,\n",
        "    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')\n",
        "\n",
        "classifier_with_remediation = tf.estimator.DNNClassifier(\n",
        "    hidden_units=[500, 100],\n",
        "    weight_column='weight',\n",
        "    feature_columns=[embedded_text_feature_column],\n",
        "    n_classes=2,\n",
        "    optimizer=tf.optimizers.Adagrad(learning_rate=0.003),\n",
        "    loss_reduction=tf.losses.Reduction.SUM,\n",
        "    model_dir=model_dir_with_remediation)\n",
        "\n",
        "classifier_with_remediation.train(input_fn=train_input_fn_with_remediation, steps=1000)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tizoJkLmHt_L",
        "colab_type": "text"
      },
      "source": [
        "## Recompute fairness metrics"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NksJR1Wh_OQv",
        "colab_type": "text"
      },
      "source": [
        "Now that we've retrained the model, let's recompute our fairness metrics. First, export the model:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6W3XZkurHtii",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def eval_input_receiver_fn():\n",
        "  serialized_tf_example = tf.compat.v1.placeholder(\n",
        "      dtype=tf.string, shape=[None], name='input_example_placeholder')\n",
        "\n",
        "  receiver_tensors = {'examples': serialized_tf_example}\n",
        "\n",
        "  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)\n",
        "  features['weight'] = tf.ones_like(features[LABEL])\n",
        "\n",
        "  return tfma.export.EvalInputReceiver(\n",
        "    features=features,\n",
        "    receiver_tensors=receiver_tensors,\n",
        "    labels=features[LABEL])\n",
        "\n",
        "tfma_export_dir_with_remediation = tfma.export.export_eval_savedmodel(\n",
        "  estimator=classifier_with_remediation,\n",
        "  export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model_with_remediation'),\n",
        "  eval_input_receiver_fn=eval_input_receiver_fn)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G7K8fiLRApP-",
        "colab_type": "text"
      },
      "source": [
        "Next, run the fairness evaluation using TFMA:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TNXasRqy_3wh",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "tfma_eval_result_path_with_remediation = os.path.join(BASE_DIR, 'tfma_eval_result_with_remediation')\n",
        "\n",
        "slice_selection = 'gender'\n",
        "compute_confidence_intervals = False\n",
        "\n",
        "# Define slices that you want the evaluation to run on.\n",
        "slice_spec = [\n",
        "    tfma.slicer.SingleSliceSpec(), # Overall slice\n",
        "    tfma.slicer.SingleSliceSpec(columns=['gender']),\n",
        "]\n",
        "\n",
        "# Add the fairness metrics.\n",
        "add_metrics_callbacks = [\n",
        "  tfma.post_export_metrics.fairness_indicators(\n",
        "      thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],\n",
        "      labels_key=LABEL\n",
        "      )\n",
        "]\n",
        "\n",
        "eval_shared_model_with_remediation = tfma.default_eval_shared_model(\n",
        "    eval_saved_model_path=tfma_export_dir_with_remediation,\n",
        "    add_metrics_callbacks=add_metrics_callbacks)\n",
        "\n",
        "validate_dataset = tf.data.TFRecordDataset(filenames=[validate_tf_file])\n",
        "\n",
        "# Run the fairness evaluation.\n",
        "with beam.Pipeline() as pipeline:\n",
        "  _ = (\n",
        "      pipeline\n",
        "      | 'ReadData' >> beam.io.ReadFromTFRecord(validate_tf_file)\n",
        "      | 'ExtractEvaluateAndWriteResults' >>\n",
        "       tfma.ExtractEvaluateAndWriteResults(\n",
        "                 eval_shared_model=eval_shared_model_with_remediation,\n",
        "                 slice_spec=slice_spec,\n",
        "                 compute_confidence_intervals=compute_confidence_intervals,\n",
        "                 output_path=tfma_eval_result_path_with_remediation)\n",
        "  )\n",
        "\n",
        "eval_result_with_remediation = tfma.load_eval_result(output_path=tfma_eval_result_path_with_remediation)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KbNjpj_jKyy9",
        "colab_type": "text"
      },
      "source": [
        "## Load evaluation results\n",
        "\n",
        "Run the following two cells to load results in the What-If tool and Fairness Indicators. \n",
        "\n",
        "In the What-If tool, we'll load 1,000 examples with the corresponding predictions returned from both the baseline model and the remediated model.\n",
        "\n",
        "#### **WARNING: When you launch the What-If tool widget below, the left panel will display the full text of individual comments from the Civil Comments dataset. Some of these comments include profanity, offensive statements, and offensive statements involving identity terms. If this is a concern, run the Alternative cell at the end of this section instead of the two code cells below, and skip question #4 in the following [Exercise](#scrollTo=QrecUFHfBCyC).**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8fN3CWs4K1RG",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "DEFAULT_MAX_EXAMPLES = 1000\n",
        "\n",
        "# Load 100000 examples in memory. When first rendered, What-If Tool only\n",
        "# displays 1000 of these examples to ensure data loads successfully for most\n",
        "# browser/machine configurations. \n",
        "def wit_dataset(file, num_examples=100000):\n",
        "  dataset = tf.data.TFRecordDataset(\n",
        "      filenames=[train_tf_file]).take(num_examples)\n",
        "  return [tf.train.Example.FromString(d.numpy()) for d in dataset]\n",
        "\n",
        "wit_data = wit_dataset(train_tf_file)\n",
        "\n",
        "# Configure WIT with 1000 examples, the FEATURE_MAP we defined above, and\n",
        "# a label of 1 for positive (toxic) examples and 0 for negative (nontoxic)\n",
        "# examples\n",
        "config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(\n",
        "    classifier, FEATURE_MAP).set_compare_estimator_and_feature_spec(\n",
        "    classifier_with_remediation, FEATURE_MAP).set_label_vocab(['0', '1']).set_target_feature(LABEL)\n",
        "wit = WitWidget(config_builder)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YQLkQYsV27QH",
        "colab_type": "text"
      },
      "source": [
        "In Fairness Indicators, we'll display the remediated model's evaluation results on the validation set."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Yc66w8EmK6bc",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Link Fairness Indicators widget with WIT widget above,\n",
        "# so that clicking a slice in FI below will load its data in WIT above.\n",
        "event_handlers={'slice-selected':\n",
        "              wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}\n",
        "widget_view.render_fairness_indicator(eval_result=eval_result_with_remediation,\n",
        "                                      slicing_column=slice_selection,\n",
        "                                      event_handlers=event_handlers)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iREnWN4o8jtR",
        "colab_type": "code",
        "cellView": "form",
        "colab": {}
      },
      "source": [
        "#@title Alternative: Run this cell only if you intend to skip the What-If tool exercises (see Warning above)\n",
        "# Link Fairness Indicators widget with WIT widget above,\n",
        "# so that clicking a slice in FI below will load its data in WIT above.\n",
        "widget_view.render_fairness_indicator(eval_result=eval_result_with_remediation,\n",
        "                                      slicing_column=slice_selection)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QrecUFHfBCyC",
        "colab_type": "text"
      },
      "source": [
        "## Exercise: Analyze the results\n",
        "\n",
        "Use the What-If Tool and Fairness Indicators widgets above to answer the following questions."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_rFZv3FbgNwy",
        "colab_type": "text"
      },
      "source": [
        "#### **1. In [Fairness Exercise 1: Explore the Model](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/pc/exercises/fairness_text_toxicity_part1.ipynb?utm_source=external-colab&utm_campaign=colab-external&utm_medium=referral&utm_content=fairnessexercise1-colab), our baseline model had an FPR of 0.28 overall and FPRs of 0.51 and 0.47 for `male` and `female` examples, respectively. In our revised model, what are the FPRs for `male` and `female` subgroups? How do these values compare to the overall FPR?** "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dV10ovJif-A3",
        "colab_type": "text"
      },
      "source": [
        "#### Solution\n",
        "\n",
        "Click below for the solution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A86oWS47gAHl",
        "colab_type": "text"
      },
      "source": [
        "When we evaluated our model against the validation set, we got an FPR of 0.28 for `male` and 0.24 for `female`. The overall FPR was 0.23.\n",
        "\n",
        "![FPR results for the revised model displayed in Fairness Indicators. The \"male\" and \"female\" FPR values in the table are circled, showing an FPR of 0.28 for \"male\" and 0.24 for \"female\", and an overall FPR of 0.23.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise1.png)\n",
        "\n",
        "The FPR for `male` is now approximately 20% higher than the overall rate, and the FPR for `female` is now approximately 5% lower than the overall rate. This is a significant improvement over our previous model, where the FPRs for `male` and `female` were +83% and +69% higher, respectively, than the overall FPR. \n",
        "\n",
        "**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Oe9RSy_c56pz",
        "colab_type": "text"
      },
      "source": [
        "#### **2. What other metrics should we audit to confirm gender subgroup biases have been successfully remediated? What are the results on these metrics?**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Amp8qxOA7d5R",
        "colab_type": "text"
      },
      "source": [
        "#### Solution\n",
        "\n",
        "Click below for the solution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pwb58a6T7gxa",
        "colab_type": "text"
      },
      "source": [
        "We should also review FNR. \n",
        "\n",
        "A model optimized solely to decrease FPR could learn to always predict the negative class (\"nontoxic\"), which would result in a FPR of 0. However, this would cause the FNR to skyrocket because every actual positive (\"toxic\") example would be misclassified and a false negative. \n",
        "\n",
        "While our primary metric for evaluating remediation is FPR, we still want to make sure we're OK with any tradeoff in increased FNR that we incur to decrease FPR.\n",
        "\n",
        "If we take a look at FNR results for the revised model, we see that the overall FNR is 0.34, `male` FNR is 1% lower at 0.33, and `female` FNR is 12% higher at 0.38. So we can confirm that our subgroup FNRs are not dramatically higher than overall FNR, and overall FNR itself is not sky-high.\n",
        "\n",
        "**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*\n",
        "\n",
        "![False negative rate results for gender subgroups displayed in the Fairness Indicators widget. Overall FNR is 0.33, \"male\" FNR is 1.08485% lower at 0.33358, and \"female\" FNR is 11.86052% higher at 0.38](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise2.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jNZvF7btDshi",
        "colab_type": "text"
      },
      "source": [
        "#### **3. Do you see any areas where further improvement is needed?** "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KJob8ZofVnUC",
        "colab_type": "text"
      },
      "source": [
        "#### Solution\n",
        "\n",
        "Click below for one possible solution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k3Eo6rjLVqLd",
        "colab_type": "text"
      },
      "source": [
        "If we hover over the `other_gender` slice, as shown above, we see that there are only 6 examples in this slice. This is an extremely small number of examples in comparison to the `male` and `female` groups, which each have over 15,000 examples. \n",
        "\n",
        "![FNR results for gender subgroups displayed in the Fairness Indicators widget. A pop-up is displayed above the \"other gender\" slice, which shows an Example Count of 6 for this subgroup.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/fairness_indicators_colab2_exercise3.png)\n",
        "\n",
        "**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours shown above.*\n",
        "\n",
        "With an `other_gender` slice this small, we can't make any statistically significant assertions about the model's performance on this subgroup (changing the classification of just one example would cause a swing of 16.6% in FNR or FPR). Upweighting is not sufficient here; we're going to need to add more examples to the `other_gender` subgroup that the model can learn from."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "USV_7odN-8Ru",
        "colab_type": "text"
      },
      "source": [
        "#### **4. Compare the performance of the baseline model and the revised model on the `female` subgroup as follows:**\n",
        "\n",
        "Click on the bar of the _female_ slice in the Fairness Indicators widget to load the corresponding individual female examples in the What-If Tool widget above. Create a scatterplot that plots toxicity scores for the baseline model (**Inference Score 1**) against toxicity scores for the revised model (**Inference Score 2**), with each example color-coded by ground-truth label (**toxicity**).\n",
        "\n",
        "#### **What trends can you identify from this graph?**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I-SJbapBEXPe",
        "colab_type": "text"
      },
      "source": [
        "#### Solution\n",
        "\n",
        "Click below for a solution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "So73VwuSEYwO",
        "colab_type": "text"
      },
      "source": [
        "Here's our graph, with toxicity scores for the baseline model plotted along the x-axis, and toxicity scores for the revised model plotted along the y-axis. Actual toxic examples are colored red, and actual nontoxic examples are colored blue.\n",
        "\n",
        "**NOTE:** *Model training is not deterministic, so your exact results may vary slightly from ours.*\n",
        "\n",
        "![Scatterplot in the What-If tool, plotting toxicity score of the baseline model along the x-axis (\"Scatter | X-Axis\" set to \"Inference score 1\") and toxicity score of the revised model along the y-axis (\"Scatter | X-Axis\" set to \"Inference score 2\", with \"Color By\" set to \"toxicity\" so that examples are color-coded by their actual toxicity labels. The relationship between the two scores is generally linear, with a few clusters of negative-example outliers circled where toxicity score is significantly lower for the revised model.](http://developers.google.com/machine-learning/practica/fairness-indicators/colab-images/wit_colab2_exercise4.png)\n",
        "\n",
        "The relationship between the two scores is generally linear, but we can see a few clusters of blue outliers (circled above) where the revised model predicts a significantly lower toxicity score than the baseline model. We can extrapolate that the revised model does a better job of predicting low toxicity scores for a percentage of nontoxic `female` examples (though there's still room for further improvement)."
      ]
    }
  ]
}
